AAAI.2021 - Humans and AI | Cool Papers - Immersive Paper Discovery

#1 Automated Storytelling via Causal, Commonsense Plot Ordering [PDF] [Copy] [Kimi]

Authors: Prithviraj Ammanabrolu ; Wesley Cheung ; William Broniec ; Mark O. Riedl

Automated story plot generation is the task of generating a coherent sequence of plot events. Causal relations between plot events are believed to increase the perception of story and plot coherence. In this work, we introduce the concept of soft causal relations as causal relations inferred from commonsense reasoning. We demonstrate C2PO, an approach to narrative generation that operationalizes this concept through Causal, Commonsense Plot Ordering. Using human-participant protocols, we evaluate our system against baseline systems with different commonsense reasoning reasoning and inductive biases to determine the role of soft causal relations in perceived story quality. Through these studies we also probe the interplay of how changes in commonsense norms across storytelling genres affect perceptions of story quality.

#2 MARTA: Leveraging Human Rationales for Explainable Text Classification [PDF] [Copy] [Kimi]

Authors: Ines Arous ; Ljiljana Dolamic ; Jie Yang ; Akansha Bhardwaj ; Giuseppe Cuccu ; Philippe Cudré-Mauroux

Explainability is a key requirement for text classification in many application domains ranging from sentiment analysis to medical diagnosis or legal reviews. Existing methods often rely on "attention" mechanisms for explaining classification results by estimating the relative importance of input units. However, recent studies have shown that such mechanisms tend to mis-identify irrelevant input units in their explanation. In this work, we propose a hybrid human-AI approach that incorporates human rationales into attention-based text classification models to improve the explainability of classification results. Specifically, we ask workers to provide rationales for their annotation by selecting relevant pieces of text. We introduce MARTA, a Bayesian framework that jointly learns an attention-based model and the reliability of workers while injecting human rationales into model training. We derive a principled optimization algorithm based on variational inference with efficient updating rules for learning MARTA parameters. Extensive validation on real-world datasets shows that our framework significantly improves the state of the art both in terms of classification explainability and accuracy.

#3 Human Uncertainty Inference via Deterministic Ensemble Neural Networks [PDF] [Copy] [Kimi]

Authors: Yujin Cha ; Sang Wan Lee

The estimation and inference of human predictive uncertainty have great potential to improve the sampling efficiency and prediction reliability of human-in-the-loop systems for smart healthcare, smart education, and human-computer interactions. Predictive uncertainty in humans is highly interpretable, but its measurement is poorly accessible. Contrarily, the predictive uncertainty of machine learning models, albeit with poor interpretability, is relatively easily accessible. Here, we demonstrate that the poor accessibility of human uncertainty can be resolved by exploiting simple and universally accessible deterministic neural networks. We propose a new model for human uncertainty inference, called proxy ensemble network (PEN). Simulations with a few benchmark datasets demonstrated that the model can efficiently learn human uncertainty from a small amount of data. To show its applicability in real-world problems, we performed behavioral experiments, in which 64 physicians classified medical images and reported their level of confidence. We showed that the PEN could predict both the uncertainty range and diagnoses given by subjects with high accuracy. Our results demonstrate the ability of machine learning in guiding human decision making; it can also help humans in learning more efficiently and accurately. To the best of our knowledge, this is the first study that explored the possibility of accessing human uncertainty via the lens of deterministic neural networks.

#4 Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control [PDF] [Copy] [Kimi]

Authors: Yu-Wei Chao ; Jimei Yang ; Weifeng Chen ; Jia Deng

Recent progress on physics-based character animation has shown impressive breakthroughs on human motion synthesis, through imitating motion capture data via deep reinforcement learning. However, results have mostly been demonstrated on imitating a single distinct motion pattern, and do not generalize to interactive tasks that require flexible motion patterns due to varying human-object spatial configurations. To bridge this gap, we focus on one class of interactive tasks---sitting onto a chair. We propose a hierarchical reinforcement learning framework which relies on a collection of subtask controllers trained to imitate simple, reusable mocap motions, and a meta controller trained to execute the subtasks properly to complete the main task. We experimentally demonstrate the strength of our approach over different non-hierarchical and hierarchical baselines. We also show that our approach can be applied to motion prediction given an image input. A supplementary video can be found at https://youtu.be/3CeN0OGz2cA.

#5 User Driven Model Adjustment via Boolean Rule Explanations [PDF] [Copy] [Kimi]

Authors: Elizabeth M. Daly ; Massimiliano Mattetti ; Öznur Alkan ; Rahul Nair

AI solutions are heavily dependant on the quality and accuracy of the input training data, however the training data may not always fully reflect the most up-to-date policy landscape or may be missing business logic. The advances in explainability have opened the possibility of allowing users to interact with interpretable explanations of ML predictions in order to inject modifications or constraints that more accurately reflect current realities of the system. In this paper, we present a solution which leverages the predictive power of ML models while allowing the user to specify modifications to decision boundaries. Our interactive overlay approach achieves this goal without requiring model retraining, making it appropriate for systems that need to apply instant changes to their decision making. We demonstrate that user feedback rules can be layered with the ML predictions to provide immediate changes which in turn supports learning with less data.

#6 Classification Under Human Assistance [PDF] [Copy] [Kimi]

Authors: Abir De ; Nastaran Okati ; Ali Zarezade ; Manuel Gomez Rodriguez

Most supervised learning models are trained for full automation. However, their predictions are sometimes worse than those by human experts on some specific instances. Motivated by this empirical observation, our goal is to design classifiers that are optimized to operate under different automation levels. More specifically, we focus on convex margin-based classifiers and first show that the problem is NP-hard. Then, we further show that, for support vector machines, the corresponding objective function can be expressed as the difference of two functions f = g - c, where g is monotone, non-negative and gamma-weakly submodular, and c is non-negative and modular. This representation allows a recently introduced deterministic greedy algorithm, as well as a more efficient randomized variant of the algorithm, to enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from several applications in medical diagnosis illustrate our theoretical findings and demonstrate that, under human assistance, supervised learning models trained to operate under different automation levels can outperform those trained for full automation as well as humans operating alone.

#7 Wasserstein Distributionally Robust Inverse Multiobjective Optimization [PDF] [Copy] [Kimi]

Authors: Chaosheng Dong ; Bo Zeng

Inverse multiobjective optimization provides a general framework for the unsupervised learning task of inferring parameters of a multiobjective decision making problem (DMP), based on a set of observed decisions from the human expert. However, the performance of this framework relies critically on the availability of an accurate DMP, sufficient decisions of high quality, and a parameter space that contains enough information about the DMP. To hedge against the uncertainties in the hypothetical DMP, the data, and the parameter space, we investigate in this paper the distributionally robust approach for inverse multiobjective optimization. Specifically, we leverage the Wasserstein metric to construct a ball centered at the empirical distribution of these decisions. We then formulate a Wasserstein distributionally robust inverse multiobjective optimization problem (WRO-IMOP) that minimizes a worst-case expected loss function, where the worst case is taken over all distributions in the Wasserstein ball. We show that the excess risk of the WRO-IMOP estimator has a sub-linear convergence rate. Furthermore, we propose the semi-infinite reformulations of the WRO-IMOP and develop a cutting-plane algorithm that converges to an approximate solution in finite iterations. Finally, we demonstrate the effectiveness of our method on both a synthetic multiobjective quadratic program and a real world portfolio optimization problem.

#8 Illuminating Mario Scenes in the Latent Space of a Generative Adversarial Network [PDF] [Copy] [Kimi]

Authors: Matthew C. Fontaine ; Ruilin Liu ; Ahmed Khalifa ; Jignesh Modi ; Julian Togelius ; Amy K. Hoover ; Stefanos Nikolaidis

Generative adversarial networks (GANs) are quickly becoming a ubiquitous approach to procedurally generating video game levels. While GAN generated levels are stylistically similar to human-authored examples, human designers often want to explore the generative design space of GANs to extract interesting levels. However, human designers find latent vectors opaque and would rather explore along dimensions the designer specifies, such as number of enemies or obstacles. We propose using state-of-the-art quality diversity algorithms designed to optimize continuous spaces, i.e. MAP-Elites with a directional variation operator and Covariance Matrix Adaptation MAP-Elites, to efficiently explore the latent space of a GAN to extract levels that vary across a set of specified gameplay measures. In the benchmark domain of Super Mario Bros, we demonstrate how designers may specify gameplay measures to our system and extract high-quality (playable) levels with a diverse range of level mechanics, while still maintaining stylistic similarity to human authored examples. An online user study shows how the different mechanics of the automatically generated levels affect subjective ratings of their perceived difficulty and appearance.

#9 ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces [PDF] [Copy] [Kimi]

Authors: Zecheng He ; Srinivas Sunkara ; Xiaoxue Zang ; Ying Xu ; Lijuan Liu ; Nevan Wichers ; Gabriel Schubiner ; Ruby Lee ; Jindong Chen

As mobile devices are becoming ubiquitous, regularly interacting with a variety of user interfaces (UIs) is a common aspect of daily life for many people. To improve the accessibility of these devices and to enable their usage in a variety of settings, building models that can assist users and accomplish tasks through the UI is vitally important. However, there are several challenges to achieve this. First, UI components of similar appearance can have different functionalities, making understanding their function more important than just analyzing their appearance. Second, domain-specific features like Document Object Model (DOM) in web pages and View Hierarchy (VH) in mobile applications provide important signals about the semantics of UI elements, but these features are not in a natural language format. Third, owing to a large diversity in UIs and absence of standard DOM or VH representations, building a UI understanding model with high coverage requires large amounts of training data. Inspired by the success of pre-training based approaches in NLP for tackling a variety of problems in a data-efficient way, we introduce a new pre-trained UI representation model called ActionBert. Our methodology is designed to leverage visual, linguistic and domain-specific features in user interaction traces to pre-train generic feature representations of UIs and their components. Our key intuition is that user actions, e.g., a sequence of clicks on different UI components, reveals important information about their functionality. We evaluate the proposed model on a wide variety of downstream tasks, ranging from icon classification to UI component retrieval based on its natural language description. Experiments show that the proposed ActionBert model outperforms multi-modal baselines across all downstream tasks by up to 15.5%.

#10 Goal Blending for Responsive Shared Autonomy in a Navigating Vehicle [PDF] [Copy] [Kimi]

Authors: Yu-Sian Jiang ; Garrett Warnell ; Peter Stone

Human-robot shared autonomy techniques for vehicle navigation hold promise for reducing a human driver’s workload, ensuring safety, and improving navigation efficiency. However, because typical techniques achieve these improvements by effectively removing human control at critical moments, these approaches often exhibit poor responsiveness to human commands—especially in cluttered environments. In this paper, we propose a novel goal-blending shared autonomy (GBSA) system, which aims to improve responsiveness in shared autonomy systems by blending human and robot input during the selection of local navigation goals as opposed to low-level motor (servo-level) commands. We validate the proposed approach by performing a human study involving an intelligent wheelchair and compare GBSA to a representative servo-level shared control system that uses a policy-blending approach. The results of both quantitative performance analysis and a subjective survey show that GBSA exhibits significantly better system responsiveness and induces higher user satisfaction than the existing approach.

#11 Contrastive Adversarial Learning for Person Independent Facial Emotion Recognition [PDF] [Copy] [Kimi]

Authors: Daeha Kim ; Byung Cheol Song

Since most facial emotion recognition (FER) methods significantly rely on supervision information, they have a limit to analyzing emotions independently of persons. On the other hand, adversarial learning is a well-known approach for generalized representation learning because it never requires supervision information. This paper presents a new adversarial learning for FER. In detail, the proposed learning enables the FER network to better understand complex emotional elements inherent in strong emotions by adversarially learning weak emotion samples based on strong emotion samples. As a result, the proposed method can recognize the emotions independently of persons because it understands facial expressions more accurately. In addition, we propose a contrastive loss function for efficient adversarial learning. Finally, the proposed adversarial learning scheme was theoretically verified, and it was experimentally proven to show state of the art (SOTA) performance.

#12 AI-Assisted Scientific Data Collection with Iterative Human Feedback [PDF] [Copy] [Kimi]

Authors: Travis Mandel ; James Boyd ; Sebastian J. Carter ; Randall H. Tanaka ; Taishi Nammoto

Although artificial intelligence has revolutionized data analysis, significantly less work has focused on using AI to improve scientific data collection. Past work in AI for data collection has typically assumed the objective function is well-defined by humans before starting an experiment; however, this is a poor fit for scientific domains where new discoveries and insights are made as data is being collected. In this paper we present a new framework to allow AI systems to work together with humans (e.g. scientists) to collect data more effectively in simple scientific domains. We present a novel algorithm, TESA, which seeks to achieve good performance by learning from past human behavior how to direct data to places that are likely to become scientifically interesting in the future. We analyze the problem theoretically, defining a novel notion of regret in this setting and showing that TESA is zero regret. Next, we show that TESA outperforms other related algorithms in simulations using real data drawn from three diverse domains (economics, mental health, and cognitive psychology). Finally, we run experiments with human subjects across these scientific domains to compare our iterative human-in-the-loop process to a (more standard) workflow in which information is communicated to the AI a priori.

#13 Improving the Performance-Compatibility Tradeoff with Personalized Objective Functions [PDF] [Copy] [Kimi]

Authors: Jonathan Martinez ; Kobi Gal ; Ece Kamar ; Levi H. S. Lelis

AI-systems that model and interact with their users can up-date their models over time to reflect new information and changes in the environment. Although these updates may improve the overall performance of the AI-system, they may actually hurt the performance with respect to individual users. Prior work has studied the tradeoff between improving the system’s performance following an update and the compatibility of the updated system with prior user experience. The more the model is forced to be compatible with a prior version, the higher loss in performance it will incur. This paper challenges this assumption by showing that by personalizing the loss function to specific users, it is possible to increase the prediction performance of the AI-system while sacrificing less compatibility for these users. Our approach updates the sample weights to reflect their contribution to the compatibility of the model for a particular user following the update. We construct a portfolio of different models that vary in how they personalize the loss function for a user. We select the best model to use for a target user based on a validation set. We apply this approach to three supervised learning tasks commonly used in the human-computer decision-making literature. We show that using our approach leads to significant improvements in the performance-compatibility tradeoff over the non-personalized approach of Bansal et al., achieving up to 300% improvement for certain users. We present several use cases that illustrate the difference between the personalized and non-personalized approach for two of our domains.

#14 Indecision Modeling [PDF] [Copy] [Kimi]

Authors: Duncan C. McElfresh ; Lok Chan ; Kenzie Doyle ; Walter Sinnott-Armstrong ; Vincent Conitzer ; Jana Schaich Borg ; John P. Dickerson

AI systems are often used to make or contribute to important decisions in a growing range of applications, including criminal justice, hiring, and medicine. Since these decisions impact human lives, it is important that the AI systems act in ways which align with human values. Techniques for preference modeling and social choice help researchers learn and aggregate peoples' preferences, which are used to guide AI behavior; thus, it is imperative that these learned preferences are accurate. These techniques often assume that people are willing to express strict preferences over alternatives; which is not true in practice. People are often indecisive, and especially so when their decision has moral implications. The philosophy and psychology literature shows that indecision is a measurable and nuanced behavior---and that there are several different reasons people are indecisive. This complicates the task of both learning and aggregating preferences, since most of the relevant literature makes restrictive assumptions on the meaning of indecision. We begin to close this gap by formalizing several mathematical indecision models based on theories from philosophy, psychology, and economics; these models can be used to describe (indecisive) agent decisions, both when they are allowed to express indecision and when they are not. We test these models using data collected from an online survey where participants choose how to (hypothetically) allocate organs to patients waiting for a transplant.

#15 Narrative Plan Generation with Self-Supervised Learning [PDF] [Copy] [Kimi]

Authors: Mihai Polceanu ; Julie Porteous ; Alan Lindsay ; Marc Cavazza

Narrative Generation has attracted significant interest as a novel application of Automated Planning techniques. However, the vast amount of narrative material available opens the way to the use of Deep Learning techniques. In this paper, we explore the feasibility of narrative generation through self-supervised learning, using sequence embedding techniques or auto-encoders to produce narrative sequences. We use datasets of well-formed plots generated by a narrative planning approach, using pre-existing, published, narrative planning domains, to train generative models. Our experiments demonstrate the ability of generative sequence models to produce narrative plots with similar structure to those obtained with planning techniques, but with significant plot novelty in comparison with the training set. Most importantly, generated plots share structural properties associated with narrative quality measures used in Planning-based methods. As plan-based structures account for a higher level of causality and narrative consistency, this suggests that our approach is able to extend a set of narratives with novel sequences that display the same high-level narrative properties. Unlike methods developed to extend sets of textual narratives, ours operates at the level of plot structure. Thus, it has the potential to be used across various media for plots of significant complexity, being initially limited to training and generation operating in the same narrative genre.

#16 Uncertain Graph Neural Networks for Facial Action Unit Detection [PDF] [Copy] [Kimi]

Authors: Tengfei Song ; Lisha Chen ; Wenming Zheng ; Qiang Ji

Capturing the dependencies among different facial action units (AU) is extremely important for the AU detection task. Many studies have employed graph-based deep learning methods to exploit the dependencies among AUs. However, the dependencies among AUs in real world data are often noisy and the uncertainty is essential to be taken into consideration. Rather than employing a deterministic mode, we propose an uncertain graph neural network (UGN) to learn the probabilistic mask that simultaneously captures both the individual dependencies among AUs and the uncertainties. Further, we propose an adaptive weighted loss function based on the epistemic uncertainties to adaptively vary the weights of the training samples during the training process to account for unbalanced data distributions among AUs. We also provide an insightful analysis on how the uncertainties are related to the performance of AU detection. Extensive experiments, conducted on two benchmark datasets, i.e., BP4D and DISFA, demonstrate our method achieves the state-of-the-art performance.

#17 Learning Rewards From Linguistic Feedback [PDF] [Copy] [Kimi]

Authors: Theodore R. Sumers ; Mark K. Ho ; Robert D. Hawkins ; Karthik Narasimhan ; Thomas L. Griffiths

We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g., commands). We propose a general framework which does not make this assumption, instead using aspect-based sentiment analysis to decompose feedback into sentiment over the features of a Markov decision process. We then infer the teacher's reward function by regressing the sentiment on the features, an analogue of inverse reinforcement learning. To evaluate our approach, we first collect a corpus of teaching behavior in a cooperative task where both teacher and learner are human. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict rewards. We then re-run our initial experiment, pairing human teachers with these artificial learners. All three models successfully learn from interactive human feedback. The inference network approaches the performance of the "literal" sentiment model, while the "pragmatic" model nears human performance. Our work provides insight into the information structure of naturalistic linguistic feedback as well as methods to leverage it for reinforcement learning.

#18 Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse Reward Learning with Iterative Reasoning and Cumulative Prospect Theory [PDF] [Copy] [Kimi]

Authors: Ran Tian ; Liting Sun ; Masayoshi Tomizuka

Classical game-theoretic approaches for multi-agent systems in both the forward policy design problem and the inverse reward learning problem often make strong rationality assumptions: agents perfectly maximize expected utilities under uncertainties. Such assumptions, however, substantially mismatch with observed human behaviors such as satisficing with sub-optimal, risk-seeking, and loss-aversion decisions. Drawing on iterative reasoning models and cumulative prospect theory, we propose a new game-theoretic framework, bounded risk-sensitive Markov Game (BRSMG), that captures two aspects of realistic human behaviors: bounded intelligence and risk-sensitivity. General solutions to both the forward policy design problem and the inverse reward learning problem are provided with theoretical analysis and simulation verification. We validate the proposed forward policy design algorithm and the inverse reward learning algorithm in a two-player navigation scenario. The results show that agents demonstrate bounded-intelligence, risk-averse and risk-seeking behaviors in our framework. Moreover, in the inverse reward learning task, the proposed bounded risk-sensitive inverse learning algorithm outperforms a baseline risk-neutral inverse learning algorithm by effectively learning not only more accurate reward values but also the intelligence levels and the risk-measure parameters of agents from demonstrations.

#19 Content Learning with Structure-Aware Writing: A Graph-Infused Dual Conditional Variational Autoencoder for Automatic Storytelling [PDF] [Copy] [Kimi]

Authors: Meng-Hsuan Yu ; Juntao Li ; Zhangming Chan ; Rui Yan ; Dongyan Zhao

Recent automatic storytelling methods mainly rely on keyword planning or plot skeleton generation to model long-range dependencies and create consistent narrative texts. However, these approaches generate story plans or plots sequentially, leaving the non-sequential conception and structural design processes of human writers unexplored. To mimic human writers and exploit the fine-grained, intrinsic structural information of each story, we decompose automatic story generation into sub-problems of graph construction, graph generation, and graph-infused sequence generation. Specifically, we propose a graph-infused dual conditional variational autoencoder model to capture multi-level intra-story structures (i.e., graph) by continuous variational latent variables and generate consistent stories through dual-infusion of story structure planning and content learning. Experimental results on the ROCStories dataset and the CMU Movie Summary corpus confirm that our proposed model outperforms strong baselines in both human judges and widely-used automatic metrics.

#20 A Continual Learning Framework for Uncertainty-Aware Interactive Image Segmentation [PDF] [Copy] [Kimi]

Authors: Ervine Zheng ; Qi Yu ; Rui Li ; Pengcheng Shi ; Anne Haake

Deep learning models have achieved state-of-the-art performance in semantic image segmentation, but the results provided by fully automatic algorithms are not always guaranteed satisfactory to users. Interactive segmentation offers a solution by accepting user annotations on selective areas of the images to refine the segmentation results. However, most existing models only focus on correcting the current image's misclassified pixels, with no knowledge carried over to other images. In this work, we formulate interactive image segmentation as a continual learning problem and propose a framework to effectively learn from user annotations, aiming to improve the segmentation on both the current image and unseen images in future tasks while avoiding deteriorated performance on previously-seen images. It employs a probabilistic mask to control the neural network's kernel activation and extract the most suitable features for segmenting images in each task. We also apply a task-aware embedding to automatically infer the optimal kernel activation for initial segmentation and subsequent refinement. Interactions with users are guided through multi-source uncertainty estimation so that users can focus on the most important areas to minimize the overall manual annotation effort. Experiments are performed on both medical and natural image datasets to illustrate the proposed framework's effectiveness on basic segmentation performance, forward knowledge transfer, and backward knowledge transfer.

#21 Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach [PDF] [Copy] [Kimi]

Authors: Suping Zhou ; Jia Jia ; Zhiyong Wu ; Zhihan Yang ; Yanfeng Wang ; Wei Chen ; Fanbo Meng ; Shuo Huang ; Jialie Shen ; Xiaochuan Wang

Effective emotion inference from user queries helps to give a more personified response for Voice Dialogue Applications(VDAs). The tremendous amounts of VDA users bring in diverse emotion expressions. How to achieve a high emotion inferring performance from large-scale Internet Voice Data in VDAs? Traditionally, researches on speech emotion recognition are based on acted voice datasets, which have limited speakers but strong and clear emotion expressions. Inspired by this, in this paper, we propose a novel approach to leverage acted voice data with strong emotion expressions to enhance large-scale unlabeled internet voice data with diverse emotion expressions for emotion inferring. Specifically, we propose a novel semi-supervised multi-modal curriculum augmentation deep learning framework. First, to learn more general emotion cues, we adopt a curriculum learning based epoch-wise training strategy, which trains our model guided by strong and balanced emotion samples from acted voice data and sub-sequently leverages weak and unbalanced emotion samples from internet voice data.Second, to employ more diverse emotion expressions, we design a Multi-path Mix-match Multimodal Deep Neural Network(MMMD), which effectively learns feature representations for multiple modalities and trains labeled and unlabeled data in hybrid semi-supervised methods for superior generalization and robustness. Experiments on an internet voice dataset with 500,000 utterances show our method outperforms (+10.09% in terms of F1) several alternative baselines, while an acted corpus with 2,397 utterances contributes 4.35%. To further compare our method with state-of-the-art techniques in traditionally acted voice datasets, we also conduct experiments on public dataset IEMOCAP. The results reveal the effectiveness of the proposed approach.